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The analysis of the modular structure of networks is a major challenge in complex networks 
theory. The validity of the modular structure obtained is essential to confront the problem of 
the topology-functionality relationship. Recently several authors have worked on the limit of 
resolution that different community detection algorithms have, making impossible the detection 
of natural modules when very different topological scales coexist in the network. Existing mul- 
tiresolution methods are not the panacea for solving the problem in extreme situations, and also 
fail. Here, we present a new hierarchical multiresolution scheme that works even when the net- 
work decomposition is very close to the resolution limit. The idea is to split the multiresolution 
method for optimal subgraphs of the network, focusing the analysis on each part independently. 
We also propose a new algorithm to speed up the computational cost of screening the mesoscale 
looking for the resolution parameter that best splits every subgraph. The hierarchical algorithm 
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• • is able to solve a difficult benchmark proposed in Lancichinetti & Fortunato 2011 , encouraging 

^ the further analysis of hierarchical methods based on the modularity quality function. 
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1. Introduction 

The quality function called modularity has been largely used in the assessment of the modular structure of 



networks Girvan & Newman 


2002 


Newman <fc Girvan 


2004 


Newman 


2004a 


Clauset et al. 


2004 


Duch 


& Arenas 


2005 


Danon et al. 


2005 


and for data clustering and exploration |Newman 


2006 


Granell et al. 



2011 . Modularity is a global descriptor of a complex network that measures the difference between a given 
partition of the network and the same partition in an ensemble of the randomized versions of the original 
network preserving the local strength of every node. The optimization of modularity is coherently related 
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to the definition of modules in the network; a module is defined as the result of the optimal modularity 
partition. In 2007, Fortunato & Barthelemy Fortunato & Barthelemy 2007 pointed out a drawback in 



this function consisting in a certain resolution limit (generalized later in Kumpula et al. 2007 Good 



et al., 2010| ), beyond which optimization of modularity is unable to identify certain modules, even those 



easily detectable at first sight, such as cliques almost disconnected from the rest of the network. This 
effect is known as the resolution limit of modularity. This problem arises because modularity fixes a global 
scale that could be appropriate for some networks but not for others, specially not suitable for those 
networks conformed by coexisting densely large and small communities. After this work, a multiresolution 
method was introduced in 



Arenas et al. , 2008 



which preserved the use of modularity with the addition 
of a parameter to control the resistance of nodes to form communities. The idea is that the analysis of 
communities may be performed at different scales of description, and the resolution limit is overcome just 



by moving to the right scale. Other methods to overcome the resolution limit are found in Reichardt & 



Bornholdt 2004 Pons fc Latapy 2011 Traag et al. 2011 Berry et al. 2011 Ronhovde fc Nussinov 2009 



2010 



A recent work by Lancichinetti & Fortunato, 2011 shows that even those methods devoted to avoid 
the resolution limit, 

to really avoid such resolution problem. 



indeed have a resolution limit, and propose the use of an algorithm composed of 
several approaches called OSLOM |Lancichinetti et al. 



2011 



The proof that multiresolution schemes still have a resolution limit is performed analytically on the RB 
(after Reichardt-Bornholdt) method, and extended qualitatively using examples to the AFG (after Arenas- 
Fernandez- Gomez) method and the recent CPM (Constant Potts Model) method. 

We have performed extensive simulations using the AFG method and conclude that the authors of 
Lancichinetti & Fortunato, 2011| are right, the AFG method also has a resolution limit, and that the 
benchmark they propose (see Fig. [TJ, consisting of a giant Erdos-Renyi (ER) network and two small 
cliques, connected between them by just one link, is impossible to separate in the configuration of one 
cluster for the giant ER network and one cluster for each of the cliques, in the current proposal of the AFG 
method. Even though the synthetic benchmarks where multiresolution methods could fail are far from the 
structure of real networks, it is still challenging to investigate what are the problems and how to solve 
them. 

In this paper we focus in the AFG method, analyzing its performance in resolution limiting situations, 
and proposing alternatives to eliminate or, if not possible, diminish the effect of this limit to minimum. An 
alternative is presented, a hierarchical application of the resolution screening, that avoids the resolution 
limit. The hierarchical application of a multiresolution method consists in to focus the screening on different 
clusters of the network as soon as these clusters are detected. 



2. Multiresolution AFG method 

In a previous work, the authors proposed a method that allows the full screening of the topological structure 



at different resolution levels using the original formulation and semantics of modularity as defined in |Girvan 



& Newman, 2002 . The original modularity allows the comparison of different partitions of the network. 
Given a network partitioned into communities, being Cj the community to which node i is assigned, the 
mathematical definition of modularity is 



Q[wij,C] 



1 

2w 



EE 



WiW 



w 



'J 



2w 



(1) 



where w^j is the weight of the link between nodes i and j (zero if no link exists) , w i = Y^- w%j is the strength 



2004a|. The Kronecker delta 



of node i and 2w = EJj is the total strength of the network Newman 
function 5(Ci, Cj) takes the value 1 if node i and j are into the same community and otherwise. Several 
authors have attacked the problem of modularity optimization, with considerable success, by proposing 



different heuristics |Newman| 


2004b 


Clauset et al. 


2004 


Guimera &; Amaral 


2005 


Duch &; Arenas 


2005 


Pujol et al. 


2006, 


Newman 




2006 , see |Fortunato 


20101 


for a review. 



The AFG method was designed to evaluate the community structure of networks using a kind of 
magnifying glass of the topology [Arenas et al. 2008 . The mathematical form of this prescription is given 
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Fig. 1. Benchmark proposed by [Lancichinetti fc Fortunato| |2011| to test the resolution limit of multiresolution methods. 
The large component is a ER network of 400 nodes with k = 100 linked to two cliques of 13 nodes each, sharing only one link 
between them. The goal is to separate these three subgraphs using a community detection algorithm aimed to detect multiple 
resolutions. 



by 

Qafg[?%, C, r] = Q[wij + rSij, C] , (2) 

where the resistance r is the parameter controlling the resolution of the partitions we want to find, and 
Wij + r5ij is the new weights matrix after the addition of a self- loop with value r to each node. When r 
is zero, we recover the standard modularity Q. The definition of Qafg preserves the original semantics of 
modularity. 

A refinement of the AFG method may be found in [Granell et olj |2011[ |2012 in press , where the 
original formulation of modularity E q. ([I]) is replaced by its extension to networks with positive and 



negative weights Gomez et al., 2009 Traag & Bruggeman, 2009 . Although the differences are usually 



small, this is necessary since the access to the macroscale needs the use of negative values of the resistance, 
even if the original network has only positive weights. Thus, the adequate formulation of modularity Eq. 
for undirected weighted signed networks which should be used is 
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are the positive and negative strengths of node i, and 

2w + = w 



(6) 

2uT = J>r, (7) 

i 

are the positive and negative total strengths respectively. Please note that these four strengths are defined 



to be non-negative. The extension to directed networks Arenas et al. , 2007 is simply obtained by the 
substitutions in Eq. ^ 
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For the sake of simplicity, we will refer to the undirected case for the rest of the paper. In the particular 
case that the original network does not have negative weights, r < and wu = 0, Vi, Eq. ^ reads 
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where N is the total number of nodes, n s is the number of nodes in community s of the partition C, and the 
nodes and total strengths refer to the original network before the addition of the self-loops. It is interesting 
to realize that, since all the negative strengths are equal to the absolute value of r, the contribution of the 



resistance to modularity is equivalent to a constant Potts model Traag et al. , 2011 



Resolving the substructure of networks using a unique parameter as proposed in the AFG has still a 



resolution problem. As pointed out by |Lancichinetti Sz Fortunate, 2011 , when very different sized modules 



coexist, multiresolution methods will tend to break the larger groups before finding the smaller ones. The 
phenomenon is easy to understand with an example: let us imagine an image with a real size elephant 
and an ant, to see the details of the ant we have to get so close to the image that the elephant image 
disintegrates in smaller parts, and only part of the elephant is seen when focusing on the ant. In terms of 
modularity, we are trying to unravel those areas which are denser in terms of links with respect to other 
areas in the network. A way to determine if we could have resolution problems is to plot a link density 
map and detect if there are sharp contrasts. If very different topological scales coexist, there will also be 
jumps in the clustering coefficient. In the example provided by |Lancichinetti fe Fortunate 2011] , which 
consists of an ER network of 400 nodes with average degree 100 linked to two cliques of 13 nodes only by 
a unique link between them (see Fig. [TJ, the clustering coefficient presents a drastic separation of scales, 
see Fig. [2} This indicates small zones of the network very densely connected and a wide area not so dense, 
corresponding to the cliques and the ER, respectively. 



3. Hierarchical Multiresolution method 

Our approach to solve the resolution problem takes advantage of the capability of the AFG method to find 
meaningful communities from the initial steps of the mesoscale analysis. More precisely, we propose the 
use of an iterative scheme which combines the optimization of modularity close to the macroscale of the 
network with its splitting in subgraphs, one for each of the previously found communities. 

Supposing that our network is undirected, weighted, with positive weights and no self-loops, the pre- 
scription of our algorithm is the following: 

(1) Start out from the macroscale partition A4, which has only one community containing all nodes. Then, 
find the upper bound of this macroscale, which is the niininrurn value of the resistance parameter (r m i n ) 
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Fig. 2. Clustering coefficient for the benchmark of Fig[T] Note the sharp transition in the relative local density of links 
represented by the clustering coefficient. This is indeed a hint to the coexistence of very different topological scales in the 
network. 



needed to find a partition C of the network with optimal modularity Qafg^ij') C, r m ; n ] formed by more 
than one community. 

(2) Split the network in the subgraphs defined by the partition C just found. 

(3) Repeat the previous steps with each subgraph until no further subdivisions are needed. 

This algorithm defines a hierarchical organization of the nodes, where the values of r m ; n at each splitting 
define the ultrametric distances between nodes, i.e. the heights in the dendrogram at which every pair of 
nodes first meet. 

The calculations of r m ; n and C may be performed simultaneously, therefore avoiding the costly scanning 



of the whole mesoscale between the lower and upper bounds of the resistance Granell et at, 2011, 2012 in 



press . This is a consequence of the following properties: 



The value of r m ; n is negative, with the only exception in which the network is just a clique. 
Qafg [wij , M. , r] = 0, Vr < 0, because: 
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In fact, modularity Eq. [3] is always zero for Ai, no matter the network or the value of the selfdoops. 
Since Qafg [ w ij > -M. , r] =0 and modularity is a continuous and monotonically increasing function of the 
resistance for any given C / M., the optimal partition C at r mui must satisfy QafgI^j') C, r m ; n ] = 0. 
For any given partition C, the minimum meaningful value of the resistance r mui (C) is the one for which 



6 Clara Granell et al. 




Fig. 3. Example of evolution of the FTR algorithm finding r mm in four iterations of the scheme. We start at r\ = 0, 
optimizing modularity we find Q\, we look for the r m j n corresponding to the partition found at Q\ using Eq. |13| and label 
it r%, the process follows with Qi — > r% — > Q3 — > — > Q max up to finding r m ; n , beyond this value the only partition we 
will find corresponds to the whole network as a unique module. Different curves in color are values of Q[u>ij,C, r] for different 
partitions. 



Qafg[^/,C r min (C)] = 0. Thus, Eq. (11) leads to 



—27;; 

>(C) = = Q[wij,C] . (13) 
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2 



The upper bound of the macroscale is given by 



r min = min{r min (C)} (14) 

and C is the partition which minimizes r mm (C). 

All these properties may be combined in the following fast-tracking resistance (FTR) algorithm to find 
the upper bound of the macroscale: 

(1) Optimize modularity at r = 0, to obtain partition C prev . 

(2) Calculate r m i n (C prov ) using Eq. (13). 



(3) Optimize modularity at r := r m i n (C prov ), to obtain the current partition C curr . 

(4) If C curr = Cp re v or C curr = M, then (C prcv ) and C := C p 

rev ■ 

(5) Otherwise, let C prov := C curr and go back to the second step. 

In practice, this algorithm converges in a few number of steps. It stops when a value of r is found such that 
the optimization of modularity does not produce any new partition. In this case, the modularity of both 
C prev and M is zero, and no known partition can be used to obtain a better upper bound of the macroscale. 
Of course, we cannot claim that we have found the "real" r m [ n , since no optimization heuristic can ensure 
the finding of the global maximum of modularity (this problem is known to be a NP-hard problem, see 
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Fig. 4. Dendrogram resulting from the application of the hierarchical multiresolution method on the benchmark of Fig. [T] 
The grey region shows the range of the resistance parameter in which the three communities searched coexist. Note that the 
vertical lines are not scaled. 



Brandes et al., 2008| ), but this is the best approximation one may obtain. To exemplify the functioning 



of the FTR algorithm we show in Fig. [3] its application to the first hierarchical splitting of Zachary karate 
club network |Zachary 1977 . 



4. Results and discussion 

We have applied the hierarchical multi-resolution method explained before to the benchmark proposed by 
Lancichinetti & Fortunato 2011 shown in Fig. [TJ We use the FTR algorithm to speed up the process of 



finding the minimal r at which every subgraphs splits. The aim is to find the partition divided in three 
communities in which the giant ER and each clique are separated. These three communities should contain 
the nodes labeled 1 to 400, 401 to 413 and 414 to 426, respectively. As stated in the method, we have 
started out from the macroscale A4 of the network, which contains the 426 nodes. The optimal partition 
splits in two communities at a value of the resistance parameter -12.5, obtaining a community formed 
by the nodes from 1 to 400 and another community containing the 26 nodes corresponding to the two 
cliques. Performing the hierarchical method on the two communities obtained, we find that the community 
containing the 26 nodes rapidly splits in two communities of 13 nodes, at a value of the resistance equal to 
-11.69. The partition containing 400 nodes splits in two at a much greater value of the resistance parameter, 
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which is -8.97. After that, a hierarchical multiresolution is applied to any community found, until no further 
divisions are needed. The results of this example are shown in a dendrogram representation in Fig. |4j 

Observing this figure, we find that there is a region of the resistance parameter in which the three com- 
munities we were hoping to find coexist. This happens because the two cliques form their own communities 
much before the community of 400 nodes is split in two. Note that this result can not be obtained using 
the original multiresolution AFG method exploring the whole mesoscale, because of the resolution limit 
emerging from the coexistence of very different topological scales. The rationale behind the success of the 
hierarchical method in this situation is the following: the separation of the network in optimal subgraphs, 
each one split and independently analyzed through the multiresolution scheme, reduces the global reso- 
lution limit. This resolution limit depends on the number of nodes and the number of links in the whole 
structure. The multiresolution method is able to focus the attention on lower scales while other parts of 
the network are being screened independently at larger resolution values of r. 



5. Conclusions 



We have presented a hierarchical multiresolution method able to cope with networks where the resolution 
limit would make other schemes to fail, finding the natural communities as defined by [Fortunato fc 



Barthelemy 2007 . The method is boosted by a mechanism that allows the determination of the resolution 
parameter r at which to optimize modularity in a few steps. The results solving the difficult separation of the 
benchmark proposed in Lancichinetti & Fortunate 2011 are encouraging and open the door for further 



investigation of modularity based community detection methods to escape from the implicit resolution 
limit. 
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